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Abstract 

_^ We consider computations of a Turing machine under noise that 

^ causes consecutive violations of the machine's transition function. 

]j^ Given a constant upper bound /3 on the size of bursts of faults, we 

construct a Turing machine M{/3) subject to faults that can simulate 
any fault-free machine under the condition that bursts are not closer 
^T) to each other than V for an appropriate V = 0(/3^). 

o 

(N 

"T- 1 Introduction 

> 

•i-H 

rS 1.1 The problem 



Little is known about the behavior and the power of Turing machines when 
their operation is subjected to noise that can change arbitrarily the state 
and the content of the cell where the head is positioned. The main open 
question, under every noise model, is whether a machine subject to it can 
perform arbitrary computations reliably. 

Here, we construct a Turing machine that — with a slowdown by a multi- 
plicative constant — can simulate any other Turing machine even if the sim- 
ulator is subjected to constant size bursts of faults separated by a certain 
minimum number of steps from each other. 
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The problem of constructing fault-proof machines from components that 
can fail was first considered by von Neumann in [11], who addressed the 
problem in the Boolean circuits model. New advances along this path were 
made in [8, 9]. The question has been considered in uniform models of 
computation as well. A simple rule for two-dimensional cellular automata 
that keeps one bit forever even though each cell can fail with some small 
probability was given in [10]. A 3-dimensional reliably computing cellular 
automaton using Toom's rule was constructed in [5]. Alas, all simple one- 
dimensional cellular automata appear to be "ergodic" (forgetting everything 
about their initial configuration in time independent of the size). The first, 
complex, nonergodic cellular automaton was constructed in [3], and improved 
upon in [4]. It supports a hierarchical organization, based on an idea given 
in [6]. Cells are organized in units that perform fault-tolerant simulation 
of another automaton (of the same kind). The latter simulates even more 
reliably a third automaton of a similar kind, and so on. 

The question of reliable computation with Turing machines (where ar- 
bitrarily large bursts may occur with correspondingly small probability) is 
raised in [4] . As in the case of one-dimensional cellular automata, no simple 
solution to this problem appears to exist. The present paper's machine is 
intended as a building block towards the eventual (hierarchical) construction 
of such a machine. This follows the paradigm of the proof in [3], where each 
member of the hierarchy of simulations is a similar building block, coping 
with distant bursts. To the best of our knowledge, this is the first construc- 
tion of a sequential machine, reliable in a similar sense. 

The title of [1] suggests some connection, but that paper's interest is 
completely different: it examines the expressional ability, in terms of the 
arithmetical hierarchy, of Turing machines whose storage tapes are exposed 
to stochastic noise that tends to zero. 

1.2 Simulating cellular automata 

It is natural to try to derive fault-tolerant Turing machines from the exist- 
ing results on fault-tolerant cellular automata. Cannot one simply define a 
Turing machine that simulates a fault-tolerant cellular automaton? In some 
sense, the answer is yes. Suppose that we know in advance the memory re- 
quirement m = S{x) of a computation on a fault-tolerant cellular automaton 
M, on input x. The we can define a special Turing machine T{m)^ working 
on a circular tape of size tti, with the head moving always in the right di- 



rection (in other words, the "obhvious" property is hardwired), where each 
pass of T over the tape simulates one step of M. This machine wiU clearly 
have the same fault-tolerance properties that M has. 

The circular Turing machine has a strong fault-tolerant behavior (with 
a sophisticated transition rule, coming from the cellular automaton it sim- 
ulates). Our efforts on fault-tolerant Turing machines can be seen as just 
aiming to remove the limitation of circular tape (input size-dependent hard- 
ware) . 

In view of the above, it would be sufficient to define fault-tolerant sweep- 
ing behavior on a regular tape (once the head can change direction, the 
sweeping movement can be disturbed by faults): the rest can be done by 
simulating a cellular automaton. We were, however, not able to do this with- 
out recreating the hierarchical constructions used in cellular automata — with 
all the necessary changes for Turing machines. 

1.3 Turing machines 

Our contribution uses one of the standard definitions of a Turing machine, 
with the exception of no halting state. 

Definition 1.1. A Turing machine M is defined by a tuple 

(r, S, S^QstarU F) . 

Here, F is a finite set of states^ S is a finite alphabet used in cells of the tape, 
and 

(^iSxT^SxTx {-1,0,+1} 

is a transition function. The tape alphabet S contains at least the distin- 
guished symbols ^,0,1 where ^ is called the blank symbol. The distinguished 
state Qstart IS Called the starting state. The set F of final states has the prop- 
erty that whenever M enters a state in F, it can only continue from there to 
another state in F, without changing the tape. 
A configuration is a tuple 

(g,/i,x), 

where g G F, /i G Z and x G S^. Here, x[p] is the content of the tape cell at 
position p. The tape is blank at all but finitely many positions. The work of 



the machine can be described as a sequence of configurations Co, Ci, (^2, . . . , 
where Ct is the configuration at time t. If (7 = (g, /i, x) is a configuration 
then we wiU write 

C. state = q^ C.pos=h^ C.tape = x. 

Here, x is also caUed the tape configuration. j 

Though the tape alphabet may contain non-binary symbols, we will re- 
strict input and output to binary. 

Definition 1.2. For an arbitrary binary string x, let 

M{x,t) (1.1) 

denote the configuration at time t, when started from a binary input string 
X written on the tape starting from position 0, with head position and the 
starting state. Thus, the symbol at tape position p at time t can be written 

M{x^t).tape[p]. 

The transition function S tells us how to compute the next configuration from 
the present one. When the machine is in a state g, at tape position /i, and 
observes tape cell with content a, then denoting 

[a',q',j) = 6{a,q), 

it will change the state to g', change the tape cell content to a^ and move to 
tape position to h + j. For g G F we have a' = a^ q' G F. j 

We say that a fault occurs at time t if the output (a^, g^ j) of the transition 
function at this time is replaced with some other value (which is then used 
to compute the next configuration). For the sake of a clean definition of 
simulations, we will be more formal in defining fault- free histories. 

Definition 1.3 (Trajectory). Let 

Configs^ 

denote the set of all possible configurations of a Turing machine M. Consider 
a sequence rj = (/^(O), r]{l)^ . . . ) of configurations of M = (F, S, S) with r]{t) = 
(g(t), /i(t), x{t)). This sequence will be called a history of M if the following 
conditions hold: 



• (?(0) = qstart' 

• x{t + l)[n] = x{t)[n] for all n ^ hit). 

• hit + \)- hit) G {-1,0,1}. 

Let 

HistoriesM 

denote the set of all possible histories of M. A history r] with 7y(t) = 
(q'(t), /i(t), xit)) of M is called a trajectory of M if for all t we have 

{x{t + l)[/i(t)], g(t + 1), h{t + 1) - /i(t)) = 5{x{t), q{t)). (1.2) 

We say that a history has a fault at time t if (1.2) is violated at time t. 
(Thus, if a history has any one fault, it is not a trajectory.) A burst of faults 
of a history is a sequence of times containing some faults. j 

With the earlier notation (1.1), if x G S* is a string of nonblank tape 
symbols, then the history defined by 

r]{t) = M{x,t) 

for all t is a trajectory in which 7y(0) is a starting tape configuration obtained 
by surrounding x with blanks. 

1.4 Codes 

To define simulation of a noise- free machine M2 by a noisy machine Mi, 
we need to specify the correspondence between configurations of these two 
machines. After a burst, the state of the machine — as well as the state of 
cells where the head was during the burst, could have been changed in an 
arbitrary way. To proceed with the simulation, the simulating machine must 
recover the information lost. Redundant storage will help. In Section 3, we 
will specify how one step of M2 is simulated by a bounded number of steps 
of Ml. 

We formalize redundant storage with the help of codes. 

Definition 1.4 (Code). Let Si,S2 be two finite alphabets. A block code is 
given by a positive integer Q, an encoding function 99* : S2 ^ ^i and a 
decoding function 99* : Sf^ -^ S2 with the property Lp''{Lp^{x)) = x. j 



Definition 1.5 (Standard pairing). For every alphabet S that we wiU 
consider, we assume that there is a standard ordering of its elements: 
S = {^i, . . . , Sn}. This gives rise to a code 

(7*, 7*), 

where 7* (5^) is the base 2 notation of the number i, padded from the front 
to length [logn]. For example, if S = {51,52,53} then the codewords are 
01,10,11. 

For a (possibly empty) binary string x = x(l) • • •x(n) let us introduce 
the map 

x^ = x(l)x(l)x(2)x(2) • • • x{n)x{n). 

If 5 is a symbol in some alphabet S then by (5) we will understand (7*(5))^, 
and call it the standard prefix- free code of s. Similarly, 

(^,i) = ((7*(^)r7*(t)r, 

{s,t,u) = (5, (t,2i)), 

and so on. j 

We have \x^\ = 2|x| +2. There are shorter codes with the same prefix- free 
property,but minimizing the code length is not our concern here. 

Definition 1.6 (Error-correcting code). A block code is {f3^t)- burst- error- 
correcting^ if for all X G S2, ?/ ^ ^i we have (f''{y) = x whenever y differs 
from (f^{x) in at most t intervals of size < /3. j 

Example 1.7 (Tripling). Suppose that Q > 3/3 is divisible by 3, S2 = ^i , 
(f^{x) = XXX. Let (f*{y) be obtained as follows, li y = y{l) . . •y{Q)^ then 
X = (f''{y) is defined as follows: x{i) = maj(^(i), y{i + (5/3), y + 2(5/3). For 
all /3 < (5/3, this is a (/5, l)-burst-error-correcting code. 

If we repeat 5 times instead of 3, we get a (^, 2)-burst-error-correcting 
code (there are also much more efficient such codes than just repetition), j 

We will also need a more general majority function later on: 

Definition 1.8. Let x = (xi, . . . , Xn) be a sequence of symbols from a finite 
alphabet S = {ai, a2, . . . , a^}. For each j = 1, 2, . . . , m, let kj be the number 
of occurrences of a^ in x, fci + /c2 + • • • + fc^ = n. Then, 

maj(xi,X2,...,x^) = ak, 
where k = arg maxj kj . j 



In the Section 3.5, we will show how to compute a majority of values in 
a sequence of cells using only one pass over the sequence. 

Definition 1.9 (Configuration code). A configuration code is a pair of func- 
tions 

99* : S2 ^ Si , 99* : S^ ^ S2 

that encodes infinite strings of S2 into infinite strings of Si. Each block code 
((/;*,(/;*) gives rise to a natural tape configuration code which we will also 
denote by (99*, 99*). If ^ = • • • ^(— 1)(^(0)^(1) • • • is an infinite string of letters 
of S2 then (f^{x) is the string 

• ••(/p,(e(-i))¥^*(e(o)v*(e(i))---, 

while for decoding an infinite configuration ^^ we subdivide it first into blocks 
of size Q (starting with ^\0) • • • ^\Q — 1)), decode each block separately, and 
concatenate the results. j 

1.5 The result 

We will define our result in terms of universal Turing machines, operating on 
binary strings as inputs and outputs. 

Definition 1.10 (Computation result). Assume that a Turing machine M 
starting on binary x, at some time t arrives at the first time at some final 
state. Then we look at the longest (possibly empty) binary string to be 
found starting at position on the tape, and call it the computation result 

M{x). J 

Definition 1.11 (Universal Turing machine). We say that Turing machine 
U is universal among Turing machines with binary inputs and outputs, if 
for every Turing machine M, for all binary strings x, there is a binary string 
Pm such that M reaches a final state on input x if and only if U reaches a 
final state on input {pm)x^ further in this case we have U{{pm)x) = M{x). 
A universal Turing machine will be called flexible if whenever U{{p^q)x) 
halts, also U{{p)q) halts, and U{{p^q)x) = U{U{{p)q)x). In other words if a 
program has the form (p, q) then U first applies as a preprocessing step the 
program p to g, and then it starts work on the result attached in front of x. 



It is well-known that there are flexible universal Turing machines. Let us 
fix one and call it U. 

Consider an arbitrary Turing machine M with state set F, alphabet S, 
and transition function 6. A binary string p will be called a transition program 
of M if whenever 5(a, q) = (a^, q\ j) we have 

U{{p){a,q)) = {a',q',j). 

We will also require that the computation induced by the program makes 

0{\p\ + \a\ + \q\) left-right turns, over a length tape 0{\p\ + \a\ + \q\). j 

The transition program just provides a way to compute the (local) tran- 
sition function of M by the universal machine, it does not organize the rest 
of the simulation. 

Remark 1.12. In the construction provided by the textbooks, the program 
is generally a string encoding a table for the transition function S of the 
simulated machine M. Other types of program are imaginable: some simple 
transition functions can have much simpler programs. However, our fixed 
machine is good enough. Let the fixed program r be such that [/((r)(x, y)) = 
{x){y). If some machine U^ simulates M via a very simple program g, then 
U will simulate M via program (r, {pu^^ q))'- 

U{{r, {pu',q))x) = U{{pu'){q)x) = U'{{q)x) = M{x). 



For simplicity, we will consider only computations whose result is a single 
symbol, at tape position 0: 

M{x,t).tape[0] 

at any time t in which M{x^t). state is a final state. This frees us of the 
problem of having to decode before announcing the final result of the fault- 
tolerant computation. We will prove: 

Theorem 1.13 (Main). For a given Turing machine M2 with transition 
program P2, and positive integer (3, following items can he constructed: 

• Integers Q depending linearly on /3 and p2, logarithmically on IS2I; |r2|; 
further V depending quadratically on Q; 



8 



• A block code (99*, 99*) of blocksize Q; 

• A machine Mi whose number of states and alphabet size depend polynomi- 
ally on Q, with some function f defined on its alphabet; 

such that the following holds. 

Suppose that on input x, the fault-free machine M2 enters a final state at 
time T . Assume that rj is a history of machine Mi on starting configuration 
(p^{x) such that bursts of faults have size at most (5, and are separated by at 
most V steps from each other. Let t be any time > VT such that no fault 
occurred in the last V steps before and including t^ then 

f{r]{t).tape[0]) = M2{x,T).tape[0]. (1.3) 

Section 2 specifies the layout of the tape and the structure of the states, 
and introduces the notion of rules. The parts of the transition function of 
Ml dealing with redundant simulation are defined in Section 3. Section 4 
introduces the parts allowing to restore the structure of the simulation after 
locally garbled by faults. The main theorem is proved in Section 5. 



2 Program Structure 
2.1 Fields 

Each state of the simulating machine Mi will be a tuple q = {qi^q2^ • • • ^Qk)^ 
where the individual elements of the tuple will be called fields^ and will have 
symbolic names. For example, we will have fields Info and Drifts and may 
write qi as q.Info or just Info^ ^2 as q. Drift or Drifts and so on. 

We will call the current direction of the simulated machine M2 the drift 
(—1 for left, for none, and +1 for right). 

A properly formatted configuration of Mi splits the tape into blocks of 
Q consecutive cells called colonies. One colony of the tape of the simulating 
machine represents one cell of the simulated machine. The colony that cor- 
responds to the active cell of the simulated machine (that is the cell that the 
simulated machine is scanning) is called the base colony (later we will give 
a precise definition of this notion based on the actual history of the work of 
Ml). Once the drift is known, the union of the base colony with the neighbor 
colony in the direction of the drift is called the extended base colony (more 
precisely, see Definition 4.2). 

9 



The head wiU make some global sweeping movements over the extended 
base colony. We will use term sweep direction. But even while the sweep 
direction does not change, the head will make frequent short switchbacks 
(zigzags). 

The states of machine Mi will have a field called mode. The normal mode 
corresponds to the states where Mi is performing the simulation of M2. The 
recovery mode tries to correct some perceived fault. 

Similarly, each cell of the tape of Mi consists of several fields. Some of 
these have names identical to fields of the state. In describing the transition 
rule of Ml we will write, for example, q.Info simply as /n/o, and for the 
corresponding field a. Info of the observed cell symbol a we will write cJnfo. 
The array of values of the same field of the cells will be called a track. Thus, 
we will talk about the c.Hold track of the tape, corresponding to the c.Hold 
field of cells. 

2.2 Rules 

Instead of writing a single huge transition table, we present the transition 
function as a set of rules. Each rule consists of some conditional statements, 
similar to the ones seen in an ordinary program: "if condition then. . . " , 
where the condition is testing values of some fields of the state and the ob- 
served cell. Even though rules are written like procedures of a program, they 
describe a single transition. When several consecutive statements are given, 
then they (almost always) change different fields of the state or cell symbol, 
so they can be executed simultaneously. Otherwise and in general, even if a 
field is updated in some previous statement, in all following statements that 
use this field, its old value is considered. 

Rules can call other rules, but these calls will never form a cycle. We will 
also use some conventions introduced by the C language: namely, x ^ x + 1 
and X ^ X — 1 are abbreviated to x++ and x respectively. 

Rules can also have parameters, like Swing (a^b^u^v) (see below). Since 
each rule is called only a constant number of times in the whole program, 
the parametrized rule can be simply seen as a shorthand. 

2.3 List of fields 

The basic fields of the state and of cells are listed below, with some hints 
of their function (this does not replace our later definition of the transition 
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function) . We will not repeat every time that each field of a cell has also a 
possible value corresponding to the case when the state is blank. 

We recommend to skip this list at first reading^ and to return to it just 
for reference. 

1. Addr ranges from —Q to 2(5 — 1. The values —Q to —1 are taken during 
the left drift, while the values Q to 2Q — 1 during a right drift. During 
the first sweep of the work period, Addr is reduced modulo Q. 

2. Dir stores the direction of the previous step. 

3. Drift stores the direction of the simulated machine M2. It may have values 
0, —1, 0, 1. The value corresponds to the case when the new Drift is still 
not computed, and will also be the default value (for example in empty 
cells). The c. Drift field of the cells of the extended colony correspond to 
Drift in the state. 

4. Sw numbers the sweeps through the colony. The first sweep of a work 
period has number 1 and is to the right, and this way each right sweep 
is odd, each left sweep is even. Thus the sweep direction of the head 
is completely determined by the parity of Sw^ unless the head is at a 
"turning" point. At turning points, Sw is incremented. 

Field c.Sw holds the number of the most recent sweep. The simulation con- 
sists of a computation phase and a transferring phase^ each corresponding 
to a certain interval of sweep values to be specified below (these intervals 
depend somewhat on the value Drift). 

5. The triple of fields {c.Addr^ c.Sw^ c. Drift) will determine the role played by 
a cell in the colony work period: for notational convenience, we introduce 
the names 

Core = {Addr^ Sw^ Drift)^ c.Core = {c.Addr^ c.Sw^ c. Drift). (2.1) 

6. The Info and State tracks represent the tape symbols and the state of the 
simulated machine M2. 

7. The sweep-through is interrupted by switchbacks called zigging^ described 
by a rule Zigzag (d). Here d is the direction of sweep. The process also 
depends on a fixed parameter 

Z = 22/5, (2.2) 
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and is controlled by the fields Zig Depth and ZigDir of the state. 
From now on, we will assume that Q is a multiple of Z — 4/5: 

(z-mQ- 



(2.3) 



Every Z — 4(3 forward steps are accompanied by Z steps backward and 
forward, for a total of 



3Z - 4/5 = 3{Z - 4/5) + 8/5 < 4{Z - 4/5) 



steps. 



e-i 




Figure 1: A sweep of a base colony with zigging. 

The Mode field takes values in {Normal, Recovering}. In the absence 
of faults, the state would never leave the normal mode. On noticing any 
disorder, the state will switch to recovery mode, with the goal of eventually 
returning to normal mode. 

The fields used in recovery mode are all collected as subfields of the field 
Rec of the state, and the field c.Rec of the cell state. They will be intro- 
duced in the definition of the recovery rule. 

In particular, when the field 



c.Rec. Core 



(2.4) 



is not 0, we will call the cell marked. 
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9. Even though we store information with redundancy, faults can disturb 
the coding and decoding operations and the simulating computation it- 
self. Therefore these procedures will be repeated several times, and their 
results, serving as candidates of the final values of {Info^ Drifts State) ^ will 
be stored in the c.Hold track. The different candidates will be stored 
in the different parts of the c.Hold field, which is actually a small array 
c.i7oW[l], c.i7o/d[2], c.Hold[?>\. The subfield c.Hold[i].Info holds the value 
of the i-th candidate for the new Info value of the current cell. 

Machine Mi has no final states: even when the simulated computation 
ends the simulation continues to defend the end result from faults. 



3 The Simulation 

One computational step of the machine M2 is simulated by many steps of 
Ml that make one unit called the work period. 

3.1 Coding 

We will frequently make use of the parameter 

E = 30Z. (3.1) 

For simplicity, let us assume that the set of states r2, and the alphabet S2 
are subsets of the set of binary strings {0, 1}^ for some i < Q (we can always 
ignore some states or tape symbols, if we want). We will then use the same 
code (t;*, t;*) for both the states of machine M2 and its alphabet. Let (t;*, v^) 
be a (/3, 2)-burst-error-correcting code 



:{0,1}^^{0,1}«- 



V. --2-2^ 



(The length of the code is not Q, only Q — 2.2£', since we will leave place 
the codeword at a distance I.IE from both colony ends.) We could use, for 
example, the tripling code of Example 1.7. Other codes are also appropriate, 
but we require that they have some fixed, constant programs ^encode, ^decode 
on the universal machine [/, in the following sense: 

V^{x) = U{{pencode)x), v\y) = U {{p decode) V) - 
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Also, these programs must work in quadratic time and linear space on a 
one-tape Turing machine (as tripling certainly does). 

Let a be the tape configuration of M2 at time 0, and s the starting state 
of M2. The initial tape configuration a' = (f^{a) of Mi is defined as follows: 



a'[h-Q + l.lE,...,{h + l)Q-l.lE 
a'[l.lE,...,Q-l.lE- 



-l]Jnfo = v,{a[h]), (3.2) 
1]. State = v^{s). (3.3) 



In cells of the base colony and its left neighbor colony, the c.Sw and 
c. Drift fields are set to Last(+1) and 1 respectively, where Last(+1) denotes 
the last sweep of a working period for the positive drift (and is defined below 
in (3.8)). In the right neighbor colony, these values are Last(— 1) and —1 
respectively. In all other cells, these values are empty. 

The head is initially located at the first cell of the base colony. We assume 
that the Addr fields of each colony are filled properly, that is 

a^[i].Addr= i mod Q. 

The c.Hold values are empty in each cell. 

Machine Mi starts in normal mode. Drift = 1, Sw = 1. All other fields 
have also their initial (or empty) values (see Figure 2). 
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Figure 2: The initial configuration of machine M2 is encoded into the initial 
configuration of Mi, where L = Last(l), g = Last(— 1). 

The corresponding block decoding function 99* is obtained applying the 
decoding function i;* to just the c.Info track of Mi (actually to just the part 
between addresses I.IE and Q — I.IE oi each colony) to obtain the tape of 



14 



the simulated machine M2, and to just the c. State track of the base colony 
to obtain its state. 

This definition of decoding will be refined for configurations diflFerent from 
the initial one, since the location of the base colony must also be found using 
decoding. 

3.2 Sweep counter and direction 

The global sweeping movement of the head will be controlled by the 
parametrized rule 

Swing{a^ h^u^v). 

This rule makes the head swing between two extreme points a, 6, while the 
counter Sw increases from value u to value v. The Sw value is incremented 
at the "turns" a, 6 (and is also recorded on the track c.Sw). 

The sweep direction 5 of the simulating head is derived from Sw^ Addr 
and the current value Dir in the following way. On arrival of the head to an 
endpoint (that is when Dir ^ and Addr G {a, 6}), the values Sw and c.Sw 
are incremented and Dir is set to 0. In all other cases, the sweep direction is 
determined by the formula 

diT{s) = {-iy^\ (3.4) 

if Addr G {a, b} and Dir ^ 0, 
otherwise. 



Let 




As an example of rules, we present the zigging rule in Rule 3.2, which itself 
uses the rule Move(rf). At each non-zigging step, Addr ^ Addr+ S. 

Rule 3.1: Move{d) 

Dir^d II d^ {-1,0,1}. 

if Mode = Normal then Addr ^ c.Addr ^ Addr+ d 
else Rec.Addr ^ Rec.Addr+ d 
Move in direction d. 
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Rule 3.2: Zigzag{d) 

//dGJ— l,l}is the direction of progress. 

if ZigDir= -1 and {{ZigDepth = and {Z - 4(3)\Addr) 

or < ZigDepth < Z) then 

ZigDepth++ 

if ZigDepth = Z — 1 then ZigDir = 1 

Move{-d) 
else if ZigDir= 1 or {ZigDepth = and {Z - 4/3) J(Addr) then 

if ZigDepth > then ZigDepth else ZigDir < 1 

Move{d) 



3.3 The computation phase 

The aim of this phase is to obtain new values for c.State^ c. Drift and c.Info. 
It essentiaUy repeats three times the following stages: decoding^ applying the 
transition^ encoding. In more detail: 

1. For every j = 1,2,3 do 

(a) Calling by g the string found on the c. State track of the base colony 
between addresses I.IE and Q — LIE", decode it into string g = 
v''{g) (this should be the current state of the simulated machine 
M2), and store it on some auxiliary track in the base colony. Do 
this by simulating the universal machine on some work track, with 
the program ^decode: 9 = U {{paecode) g) • 

Proceed similarly with the string a found on the c.Info track of 
the base colony, to get a = t;*(a) (this should be the observed tape 
symbol of the simulated machine M2). 

(b) Compute the value 

{a',g',d) =52{d,g) 

similarly, simulating the universal machine U with program p2. 
The string p2 (as well as the constant-size programs ^decode, Pdecode) 
is "hardwired" into the transition function of Mi , that is the pro- 
gram we are writing. More precisely, before performing the com- 
putation of U{{p2){d^q)) of Definition 1.11, machine Mi writes the 
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program p2 onto some work track: for this, the string p2 wiU be a 
hteral part of the program of Mi. 

(c) Write the encoded new state v^{g') onto the c.Hold[j]. State track 
of the base colony between positions I.IE and Q — 1.1 E. 

Similarly, write the encoded new observed cell content v^{a^) onto 
the c.Hold[j].Info track of the base colony. Write also the first 
symbol of a' into position of the same track (just because the 
Main Theorem 1.13 expects the result of the whole computation 
at tape position 0, not I.IE). 
Write d into the c.Hold[j]. Drift field of each cell of the base colony. 

2. Repeat the following twice: 

Sweeping through the base colony, at each cell compute the 
majority of c.Hold[j].Info^ j = 1, 2, 3, and write into the field 
c.Info. Proceed similarly, and simultaneously, with State and 
Drift. 

It can be arranged — and we assume so — that the total number of sweeps 
of this phase, and thus the starting sweep number of the next phase, 

TfSt = 0(g), (3.6) 

depends only on Q. 



3.4 Transfer phase 

The aim of this phase, present only if Drift ^ 0, is to transfer the new State 
of M2 into the neighbor colony in the direction of 5 = Drift (which was 
computed in the previous phase), and to move there. TfSw(5) is the transfer 
sweep^ the sweep in which we start transferring in direction 6: 

TfSw(l) = TfSt, TfSw(-l) = TfSt + 1. (3.7) 

The phase consists of the following actions. 

1. Spread the value S found in the cells of the c. Drift track (they should all 
be the same) onto the neighbor colony in direction S. 
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(b) 

Figure 3: A work period of the machine Mi (zigging and many sweeps are 
not shown, for clarity). In (a) the machine drifts left, while in (b) it drifts 
right. 

2. For i = 1,2,3: 

Copy the content of c. State track of the base colony to the 
c.Hold[i]. State track of the neighbor colony. 

3. Assign the field majority: c. State <r- maj(c.i7oW[l . . . Z]. State) in all cells 
of the neighbor colony. This part ends with a sweep value TransferLast 
depending only on the program p2. 

4. If Drift = 1, then move right to cell Q (else stay where you are). 
The last sweep number of the work period is 

Last (5) = TransferLast + max(0, S). (3.8) 

The work period in case of both non-zero drift values is illustrated in 
Figure 3. 
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3.5 Interval plurality 

We give an algorithm that computes the plurality of some field c.F over 
some interval, that is, the value that appears the most, but at least 1/3 of 
the times. Rule 3.3 is a version of an algorithm from [7]. Running in a single 
sweep, the rule maintains a data structure of 2 pairs of (vi^Ci) that store 
some candidate majority values and their current weight. 

Rule 3.3: Interval-plur{F, G,n) 

11 Interval "majority" of the field F, computed and then stored in the 

field G of the machine's state. Initially [vi^ q) = (0, 0), i = 1, 2. 
if the end of the interval of length n is reached then 

% ^ argmaxj^i^2(cj) 

G^Vi 
else 

if 'Uj = c.F or Cj = for some j G {1, 2} then 
'Uj ^ c.F, Cj++ 

else c\ , C2 

(move right) 

// Actually, the swing rule will move the head (with zigging). 



3.6 Simulation with no faults 

The proof of Theorem 1.13 uses a simulation of machine M2 by machine 
Ml. Though the theorem only speaks about the end result, for the sake of 
the proof we give a formal definition of simulation. For the moment, we 
concentrate on the fault-free case. 

Definition 3.1. Let Mi, M2 be two machines, further let 

99* : Configs^^ -^ Configs^^ 

be a mapping from configurations of M2 to those of Mi, such that it maps 
starting configurations into starting configurations. We will call such a map 
a configuration encoding. Let 

$* : HistoriesMi -^ HistoriesMa 
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be a mapping. The pair (99*, $*) of mappings is called a simulation (of M2 
by Ml) if whenever ^ is an initial configuration of M2 and 77 is a trajectory of 
machine Mi with initial configuration (f^{^)^ the history $*(7y) is a trajectory 
of machine M2. 

We say that Mi simulates M2 if there is a simulation ((/;*, $*) of M2 by 
Ml. J 

We summarize the construction of the previous section in the following 
statement. 

Lemma 3.2. Machine Mi simulates machine M2. 

Proof. Since there are no faults interfering with the operation of Mi, the 
history of Mi is a trajectory, easy to break up into work periods: intervals in 
which the counter Sw is growing. Let Tt be the end of the i-th work period. 
(Though Tt is roughly proportional to t, we did not make it an exact multiple. 
Such a relation would be lost in the faulty case, anyway.) 

The code (99*, 99*) is given in Section 3.L The history decoding function 
$* for the noise-free case is 

where (f* is the tape configuration decoding function obtained from the block 
code (99*, (/9*). 

If t reaches a final state of M2, then starting from step r^, machine Mi 
will not change the state represented on the c.Info track anymore. D 

We will define formally later in Definition 4.5 what it means for the state 
to be coordinated with the observed cell. This is always the case in the noise- 
free simulation, so let us display the "main" rule of machine Mi in Rule 3.4. 
Recall the definition of marked in (2.4). 

Rule 3.4: Main rule 

if Mode = Normal then 

if not Coordinated or the cell is marked for recovery then Alarm 
else if 1 < S't^; < TfSt then Compute 
else if TfSt < Sw < Last then Transfer 
else if Last < Sw then MoveBase 

Here, rule MoveBase just moves the head to the new base in case it is not 
there yet. 
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4 Faults 

A fault is a violation of the transition function. A burst of faults can change 
the state to an arbitrary one, and change an interval of cells of size /3 arbi- 
trarily. We will call such an interval an "island of bad cells" here informally, 
later formally. 

Faults cause two kinds of change. One is that they change the information 
about the represented machine M2. This problem will be corrected with the 
help of redundancy (encoding of the information and repetition of computa- 
tion). The second kind of change affects the very structure of the simulation. 
These changes will be detected and corrected locally, by the recovery rule. 

When a coordination check fails, we will switch to recovery mode. Recov- 
ery will start with trying to identify a small interval containing the damage. 
This is followed by restoring the "structure" (addresses, c.Sw and c. Drift 
values) in the interval. 

4.1 Integrity 

Let us specify the kind of structural integrity we expect a configuration to 
have. Informally, in the absence of faults, "outer" cells are those outside the 
base colony, and even outside the area (to be called workspace) in which the 
program extends it in the transfer phase. 

Definition 4.1 (Outer cells). Recall the definition of the sweep value Last(5) 
from (3.8). For 5 G {—1, 1}, if a cell is nonempty and has < c.Addr < Q, 
c. Drift = 5, c.Sw = Last (5) then it will be called a right outer cell if 6 = —1. 
It is a left outer cell ii 5 = 1. If it is empty then it will be considered both a 
left outer cell and a right outer cell. j 

Definition 4.2 (Healthy configuration, base colony, extended base colony, 
workspace). A configuration ^ is healthy if the mode is normal, further the 
following holds. 

Let d denote the direction of sweep, as determined by (3.5). Recall that 
^.pos is the head position. We define the position / = front ((^), called the 
fronts by 

front (^) = ^._po5 + ZigDepth • d. 

This is the farthest position to which the head has advanced before starting 
a new backward zig. 
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Let 6 = Drift. Recall the definition of the transfer sweep TfSw(5) in (3.7), 
if 5 7^ 0. There is no transfer sweep if 5 = 0. We require: 

Colonies The non-blank cells of the tape form a single segment, subdivided 
into colonies, starting from the base defined by counting back from Addr 
(this is not necessarily the origin of the tape). The leftmost colony and 
rightmost colony may be only partially filled. 

The colony starting at the base is called the base colony. There is also an 
extended base colony X: this is obtained by extending the base colony in 
the direction 5^ provided Sw > TfSt (defined in (3.6)). 

The front front (^) is always in the extended base colony. The drift of 
nonempty outer cells points towards the base colony. 

Workspace The non-outer cells form a single interval called workspace^ with 
the following properties: 

• For Sw < TfSw(5), it is equal to the base colony. 

• In case of Sw = TfSw(5), it is the smallest interval including the base 
colony and the cell adjacent to front ((^) on the side of the base colony. 

• If TfSw(5) < Sw < Last (5), then it is the extended base colony. 

• When Sw = Last (5), it is the smallest interval including the future 
base colony and front ((^) (it is shrinking onto the future base colony). 

The field c.^ddr varies continuously over the workspace in all these cases, 
except possibly Sw = 1. 

Sweep For 1 < c.Sw < Last (5), we have c.Sw{x) = Sw in all cells x behind 
front((^) in the workspace. For 1 < c.Sw^ we have c.Sw{x) = Sw — 1 in all 
cells X ahead of front (^) (inclusive) in the workspace. 

Addresses Consider addresses c. Addr in the workspace. Except for Sw = 1, 
they increase continuously. 

In the first sweep, the address track c.Addr is either [—(5,0) or [(5,2(5), 
but reduced modulo Q on the segment [0, front ((^)). 

Drift If Sw > TfSt or Sw = 1 then c. Drift is constant on the workspace. 

Simulated content The Info and State tracks contain valid codewords as 
defined in Section 3.L 
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Normality All cells are unmarked, that is c.Rec.Core = throughout (see 
the definition of marking after (2.4)). 



The following observation comes directly from the definition of health. 

Lemma 4.3. In a healthy configuration^ a cell is either under the head^ or 
is in the workspace^ or is an outer cell. 

Definition 4.4 (Local configuration, replacement). A local configuration on 
a (finite or infinite) interval / is given by values assigned to the cells of /, 
along with the following information: whether the head is to the left of, to 
the right of or inside /, and if it is inside, on which cell, and what is the 
state. 

If r is a subinterval of /, then a local configuration ^ on I clearly gives 
rise to a local configuration C(^0 ^^ ^' ^^ ^^^^5 called its subconfiguration: If 
the head of ^ was in / and it was for example to the left of /', then now C(^0 
just says that it is to the left, without specifying position and state. 

Let (^ be a configuration and ({I) a local configuration that contains the 
head if and only if ^(/) contains the head. Then the configuration ClC(^) is 
obtained by replacing ^ with ( over the interval /, further if ^ contains the 
head then also replacing ^.pos with (.pos and ^. state with (.state. j 

Definition 4.5 (Coordination). The state is called coordinated with the 
content of the observed cell if it is possible for them to be in some healthy 
configuration. j 

Of course, it would be possible to give a finite table describing the coordi- 
nation conditions. But we just point out some consequences of coordination 
we will use later: 

Lemma 4.6 (Coordination). Each Core = {Addr^ Sw^ Drift) value deter- 
mines uniquely the c. Core value of the cell it is coordinated with. 

In the reverse direction^ the relation is less strict: each {c.Addr^ c.Sw) pair 
determines uniquely the Addr that can he coordinated with it, and requires 
Sw G {c.Sw., c.Sw+ 1}, with the following exception: when c.Addr is within 
4/5 of a colony end. 
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Proof. The exception comes from the fact that there are two ways for the 
head to step onto ceUs of a neighbor colony: either during the transfer sweep, 
or at times when the head makes a turn at the end of a sweep, and after 
moving forward Z — 4/3 steps, zigs back Z steps, thereby reaching 4/5 steps 
into the neighbor. D 

To describe the self-correction process, we need to characterize the kind 
of configurations that can be found during it. We cannot hope to restore 
health in all islands created by faults, in a very short time after the faults 
occurred. Indeed, as seen from Figure 4, it may happen that a burst creates 
an island, but leaves it with a state of the head that will not require it to 
zig back anymore. Moreover, this may happen in the last sweep of a work 
period, while moving the base, say, to the left: so the island created this way 
will be seen next, if ever, only if the simulated computation transfers the 
base right again. 

The following definition classifies the kinds of alteration that noise can 
bring to a healthy configuration. Informally, in islands, the structure may 
have been damaged, while in stains, only the c.Info and c. State tracks could 
be. The distress area is where structure is currently being restored. Re- 
call that the Core = {Addr^Sw^ Drift) ^ c.Core and c.Rec.Core tracks were 
introduced in (2.1) and (2.4). 





The base colony The island 

(a) 



The tape 



Sweep - 1 Sweep + 1 



Sweep 



X ■ front 



.front 



(b) 



Figure 4: (a) A burst during the last visit of the colony, at the bottom of a zig. 
It puts the state into normal mode, with appropriate values of ZigDepth and 
ZigDir. This leaves the created island "undetected" until the head returns 
to the colony (b) A burst switches the sweep value causing the head to move 
forward and leaving an island and a part of the tape without incremented 
sweep number. 
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Definition 4.7 (Annotated configuration). An annotated configuration is a 
quadruple 

(e,x,2:,<s,D), 

witli tlie following meaning: (^ is a configuration, x is a healthy configuration, 
X is a set of intervals of cells called islands^ further 5 D X is a set of intervals 
of cells called stains^ D is an interval containing the head called the distress 
area. 

The distress area contains any island containing the head. 

Islands and stains are of size < /3. The distress area has size < 3E. 

We can obtain x from ^ by changing 

• the c.Core and c.Rec.Core tracks in the islands and possibly additional 
< Z — 3f3 cells within D; 

• the cJnfo and c. State track in the stains; 

• the state, the c.Rec.Core track in D^ and the head position inside D. 

We say that an interval W is the workspace of the annotated configuration 
A if it is the workspace of x- 

The following additional properties are required: 

Islands At most one island intersects the workspace. There are at most 2 
islands in each colony that do not intersect the workspace. If there is more 
than one, then one is within distance E + 5/3 from the colony boundary 
towards the base colony. 

Stains In the base colony, either all stains but one are within a distance 
£■ + ^ to the left colony boundary, or all but one are within a distance 
E + f3 to the right colony boundary. In all other colonies, all stains but 
one are within distance E + f3 oi the boundary towards the base colony. 

Distress If D is empty then the mode is normal. 

We say that a cell is free in an annotated configuration when it is not 
in any island or D. The head is free when D is empty. An annotated 
configuration is centrally consistent if the workspace is free. j 
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Definition 4.8 (Admissible configuration). A configuration ^ is admissible 
if there is an annotated configuration (^, x,2^, 5, L^). In this case, we say 
that X is a healthy configuration satisfying ^. Any change to an admissi- 
ble configuration is called admissible^ if the resulting configuration is also 
admissible. j 

The following key lemma shows that an admissible configuration can be 
locally corrected. Recall outer cells from Definition 4.1. 

Lemma 4.9 (Correction). Consider an annotated configuration 

and an interval R = [a, b) of length 2E, further 

Rj = [a + 0.1iE,b-0.1jE) ioi ij = 0,2,4. 

Assume that either in the left half or the right half of R, at least E — 3/3 
cells of ^{R) are nonempty. Then it is possible to compute from ^.c.Core{R) 
an interval R G {i?, i?o,^4}; cl local configuration ( = C{R) ™^^ ^^ empty 
cells ^ such that x|C(^) i^ healthy^ and the following holds: 

(a) If x-pos G i?2 then R = R^ C-pos G R^ and (.ZigDepth = 0. 

Ifx-pos < a + 0.2E then R = R^^ and (.pos is to the left of R. Similarly, 
if X'Pos > b — 0.2E then R = Rq, and (.pos is to the right of R. 

(b) The states of nonempty cells of ^ can differ from the corresponding cells 
of ( only in the islands, and in at most Z — 3/3 additional positions in 
an interval in D containing x- front. 

(c) The computation of ( can be carried out by the machine Mi (relying only 
on ^ and R), using a constant number (independent of (3, Q) of passes 
over R, and a constant number of fields containing values of size < Q. 

Proof For any interval /, let a{I) denote the majority value of ^.c.Addr{x) — 
(x — a) over /, further a{I) and 5{I) the majority value of ^.c.Sw{x)^ and 
^.c.Drift{x) over /. Let 171^(1) ^ and so on, denote the multiplicity of a(/), 
and so on, over interval /. 

We now outline a procedure that finds R and (. Even if the reasoning below 
refers to the healthy configuration x occasionally, the computation only relies 
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on the configuration ^. Wlienever we write plurality^ we mean a value witli 
multiplicity larger than 1/3 of the total. Empty cells are not counted in the 
total, and do not contribute to the counts. 

1. We have the following, not mutually exclusive possibilities, that can be 
checked: 

• All but 3/5 cells of the left/right half of R are outer cells. 

• All but 3/5 cells of the left/right half of R are workspace cells. 

Proof. This follows from the fact that in the healthy configuration x, the 
workspace is surrounded by outer cells. 

2. Assume that at least 1.7 E cells of R are left outer cells of (^, or at least 
1.7 E are right outer cells. 

Without loss of generality, assume that at least 1.7 E cells in R are left 

outer cells: set R ^ Rq. The value a[a^b — E/2) is necessarily Last(l). 

Let a = a[a, b — E/2). Setting (.c.Addr{x) = a + x — a^ (.c.Sw = Last(l), 

and (.c. Drift = —1 defines (.c.Core{x) accordingly for all x in R (not 

leaving empty cells). 

Assume that the above test fails: then given that R intersects at most 3 

islands, we can assume that at least 0.3£' — 3/5 cells of R belong to the 

workspace of the healthy configuration x- 

3. Suppose that ^ has at most 3/5 outer cells in R. 
Then the non-workspace cells of 

R- = [a + 3/5, b - 3/5) 

are all island cells, since the non-island non- workspace cells of R must all 
be at the ends. 

Compute a{R~)^ and assume without loss of generality dii^a) = 1 (the 
right sweep). 

We claim 

X-front — Z < a'^ + rrio- < x-front + Z, 

where a+ = a + 3/5. Indeed, in the healthy configuration x^ the right- 
sweeping cells inside R~ form an interval on the left of x- front. By the 
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definition of annotation, m^r could diflFer from the size of this interval only 
due to island cells, and possibly an interval of size < Z — 3/3 containing 
X- front. 

3.1. Compute the addresses and sweep values. 

Recall that we assumed dir(a) = 1. First we compute the candidate 
address and sweep values in [a^a'^ + rricr)- 

Note that ^.c.Addr{x) — {x — a) should be constant as x runs on all 
non-island workspace cells of R~ with the above plurality value of a. 
Therefore it has some majority value a. 

For cells x in [a, a+ + mcr), let (.c.Sw{x) ^ a, (.c.Addr{x) ^ a + x — a. 
This can change only island cells or shift the front to the left by a number 
of cells equal to the number of island cells encountered in this interval. 

If a+ + m^ > 6 - 0.3^;, then set R ^ R^. 

Assume now a+ + m^r <h — 0.3£', and set R ^ R. 

Now we compute the candidate address and sweep values in [a++mcr, h~). 
Let a' be the majority value oi^.c.Swm [a+ + mcr, 6~), where b~ = b — 3/3 
(the majority exists, due to admissibility). Note that ^.c.Addr{x) — {x — 
a) and ^.c.Sw{x) should be constant for almost all x in [a+ + mcr^b~). 
For X in [a+ + rrifj^h)^ set (.c.Sw{x) ^ (j\ (.c.Addr{x) ^ a' + x — a. 
This again can only change island cells or possibly some cells due to the 
left shift of the front. 

In this way, the total number of cell changes is at most 3/3 + Z: at most 
3/3 in the islands and 3/3 + {Z — 3/3) due to the shift of the front. 

3.2. Compute the remainder of (. 

Assume first |a' — cr| = 1, that is two consecutive sweep values within a 
work period. 

If a' < a, then set (". front ^ a+ + m^r, (.Sw ^ a; otherwise (". front ^ 
a+ + TTicr — 1, (.Sw ^ a' . If min(a, a') > TfSt then all over /?, set the 
(.c. Drift values to the majority of the ^.c. Drift values over R. 

Assume now that a, a' are the two values corresponding to the transition 
to a new work period. 
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By assumption a = 1; we set C. front ^ a+ + rricr. 

The value x-^- Drift has a constant value S on R. We can determine it 
using majority of ^.c. Drift over i?~, and replace it all over R. 
Assume that the test 3 also fails: then R intersects the workspace without 
being essentially contained in it, or essentially disjoint from it. From the 
four possibilities of part 1 above, now only these remained: one half of R is 
essentially covered by workspace cells and the other one is not, or one half 
of R is essentially covered by outer cells and the other one is not. We can 
therefore make, without loss of generality, the following assumption: 

4. Assume that the left half of R is not covered essentially (that is to within 
3/5) by outer cells. Also, either the left half is covered essentially by 
workspace cells, or the right half is covered essentially by outer cells. 

Let m be the number of workspace cells of ^ in R. Then the intersection 
of the workspace with R must agree within 3/3 + Z with [a^a + m)^ just 
as above in part 3.1. Now we can carry out the computations of part 3 in 
the interval [a, a + m) in place of R~ . Since m > 0.3R — Z, there will be 
still sufficient cells left in this interval for the correct computation of the 
majorities. 
It is straightforward to check that the conditions of the lemma guarantee 
that the construction of ( has the properties claimed in the lemma. 

D 

Assuming that the conditions of Lemma 4.9 hold, it is clearly possible to 
compute a constant upper bound on the number of sweeps of the domain R 
needed for the machine Mi to perform the calculations, resulting in a bound 
0(/5) on the total number of steps used. 

Definition 4.10 (Correction data). The following information A = 
(5, a,a', cr, cr',5, /) incorporates all the data defining the corrected healthy 
local configuration C{R)^ provided R is given: 

• 5G{— 1,0,1} says which of the three alternative values of i?o, R^ R4 does 
R have. (In this case, R = [a + max(0, 5)0. 4^;, b + min(0, 5)0. 4^;].) 

• a^a^a^a' help computing the address and sweep values as seen in the 
proof of the Correction lemma. 

• 5 is the c. Drift value shared by all elements of the workspace inside R in 
case a > TfSt or a = 1. 
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• / = front(C) — a in case R = R. 

4.2 Recovery procedure 

Starting from a point x, the recovery procedure opens an interval 

R = zi + [-E, E), with z^ = zi- E, 

to which it apphes the algorithm of the proof of the Correction Lemma 4.9. 
There is a point in which the Correction Lemma did not specify all the 
changes: when R = Rq then it only said that the head should go to the right 
of i?, not the exact place where it should go. In this case, the procedure will 
put the head at Zi + £", that is all the way to the right edge of R. Similarly, 
if i? = i?4 then the head goes to Zi — £", that is on the left edge of R. In 
both cases we set Zig Depth = 0. 

The following example shows the need for a careful choice of the recovery 
interval. 

Example 4.11 (Motivation for aligned recovery intervals). Denote C{b) a 
colony with starting point b. Consider the following scenario. During the 
rightward transferring sweep to colony C(6), while within distance E — 4/3 of 
the right boundary b+Q^ the head hits an island, calling alarm. The recovery 
procedure opens a recovery interval and proceeds to work on it. Now, while 
the head is on the right boundary of this interval, a burst occurs. As a result 
of this burst, nothing changes inside the recovery interval, or in the head 
position or the state, but an island / is created on the right, outside of the 
recovery interval. Assume that the computation from now on continues to 
the left oi b + Q. In some much later work period, at the last sweep before 
moving left from colony C(6), a burst leaves an island within distance E — h(3 
from b -\- Q. Then, in some much later work period, during the transferring 
sweep to C(6), the head hits this new island and the recovery starts. Now we 
repeat the same scenario as above, creating an island / — /3 which will stay 
there. If we continue with this adversarial way of putting islands, the entire 
interval 6 + 4^5 + [0, £" + ;5) can be covered by islands. Then, much later, in 
a transfer to colony C{b + Q), the algorithm of the Correction Lemma 4.9 
may be defeated. j 
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To prevent such scenarios, the recovery procedure will try to ensure that 
the recovery interval have the following special property. 

Definition 4.12. An interval is called aligned if its endpoints are divisible 
by E. We require 

E I g. (4.1) 



For controlling the details, the procedure uses the field Rec.Addr to mea- 
sure the distance from point a, and a field Rec.Sw to measure the progress, 
just as in the main program. There are corresponding c. Rec.Addr and 
c. Rec.Sw fields in the cells. According to the values of Rec.Sw^ we distin- 
guish stages^ and introduce the pseudo field Stage (it is just a function of 
Rec.Sw)^ with values 

Stage G {Marking, Planning^ {i = 1,2), Mopping^ (i = 1,2)}. 

The process makes use of a number of rules: Alarm ^ Marie, Plan{i)^ Mop{i) for 
z = 1,2. Whenever we say that a rule "checks" something, it is understood 
that if the check fails, alarm is called. In all rules but in Mop^ wherever 
the head steps, it walks on marked cells or it marks them, that is it sets 
c.Rec.Core ^ 0. The rule Mop is devoted to unmarking. Zigging will be 
performed using the fields 

Rec.ZigDepth^ Rec.ZigDir^ 

and the constant parameter 

Rec.Z = 11/3. (4.2) 

However, even while zigging, the head stays strictly within the recovery in- 
terval. 

The following rule is going to run simultaneously through all the rest of 
the recovery procedure. 

• Check if c. Rec.Addr = Rec.Addr + rf, where rf = ±1 is the direction of the 
sweep. 

• If not zigging, check if c. Rec.Sw = Rec.Sw — 1. If zigging, check if 
c.Rec.Sw = Rec.Sw. 
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• Update the field Rec.Addr in every move, increasing or decreasing it as we 
move left or right. 

1. The rule Alarm sets Mode ^ Recovering, Stage ^ Marking. 

2. Rule Mark locates and marks the recovery area with c.Rec.Core ^ 1, and 
moves to Zq. (The meaning of the value 1 is that the cell is marked, but 
we did not assign any useful Core values to it yet.) It alarms if any of the 
cells along the way that it expects to be already marked is not. 

In order to mark i?, the head moves in a zigging way, similarly to what is 
done in the main simulation, as described in point 7 of Section 2, except 
that we do not go outside the interval R. Zigging makes sure not to mark 
too many cells in one sweep or without checking that they are marked 
consistently with what was marked before. 

After determining the interval R from examining a segment of 14/5 cells, 
the rule marks one half of this interval, then passes over the marked half 
to mark the other half. Here are the details. 

Let [xo, Xi) be the aligned interval of length E containing the cell x where 
alarm was called. 

i. This part starts from a cell x (where the alarm was called), and ends 
on cell X + 7/5. In its sweep 1, moving left, it remembers the majority 
of c.Addr{y) — [y — x) mod Q ior y ^ x + [—7/3,0) as a candidate 
modulo Q address A_i for x. If there is no such majority, the value is 
undefined. It also computes a majority sweep value cr_i if a majority 
exists. Now, the machine turns right and while passing over [x, x + 7/3) 
it computes Ai and ai similarly. Admissibility implies that if both \j 
are defined then they are equal. Moreover, if both are defined then 
at least one of the cjj is defined. 

From these values, we will compute a candidate mod Q address A and 
a candidate direction 5 as follows. 

(1) If one of the pairs (Aj,crj) is defined and the other one is not, 

then \ ^ \j^ 5 i j (direction is towards the undefined pair). 

Otherwise let A be the common value of the \j . 

(2) If (jj> < Gj < Gj' + 1 or Gj < (jj> then 5 ^ (—1)"^^+-^, that is 3 is 
the direction of the current sweep as defined in (3.4). 
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From A we can compute the values Xo,Xi. Now we determine Zi as 
follows: If \x — Xj\ < 0.2E for some j then let Zi = Xj. Otherwise, let 
Zi = Xj for the Xj with sign(xj — x) = 5. 

The rule achieves the following conditions. 

a. The point x is in i?, at least 0.2E away from its boundary. 

b. If |x — front (x) I < O.IE", then R reaches less than 1.3E backwards 
from front (x). 

Indeed, without loss of generality assume that the direction of the 
sweep is 1. From x — Zi < 0.2£', we obtain 

x-zi + E < 1.2E. (4.3) 

From our assumption and (3.1) we have x > x- front — 0.1 E. Ap- 
plying it to (4.3) yields x-front — (zi — E) < 1.3£'. 

At the end of this rule, being in a cell ^, we set the field 

Rec.Addr= y — Zq. 

3. The rule RangeCheck checks that all cells of R are marked. 

4. Rule Calculate carries out, over interval i?, the algorithm of the Correc- 
tion Lemma 4.9 to determine the interval R and the local configuration 
C{R). If none of the cases apply in the algorithm described in the proof, 
the rule calls alarm. It remembers the computation result in a field A as 
given in Definition 4.10. 

5. Stages Planning;^ ^^d Planning2 follow each other. Stage Planning^ calls 
rule Plan{i). 

Plan{i) calls RangeCheck and then Calculate. In case i = 1, it 
writes the resulting (.c.Core values on the c.Rec.Core track of i?, and 
c.Rec.Core ^ 1 into R\R. In case i = 2, it just checks whether the result 
is equal to the existing values of c.Rec.Core. 

6. Stages Mopping;^ ^^d Mopping2 also follow each other. Rule Mop(l) un- 
marks the cells over i?, setting c.Core ^ c.Rec.Core at the same time, if 
c.Rec.Core {0, 1}. It relies on the field i?ec.^ddr measuring the distance 
X — Zi of the current cell x from Zi, and also on part / of the data A 
introduced in Definition 4.10, (and computed in each stage Planning J. 
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If R = Rq then Rule Mop{l) moves from the left end of R to the right 
end while unmarking, and stays there. If it is i? = R^ then it moves from 
the right end to the left end while unmarking. Otherwise, it first moves 
to the end of R in direction —diT{(.Sw) (that is backward from the sweep 
direction from C), and then erases the marks up to position (".front. Then 
Rule Mop (2) follows, which is similar, but works from the other direction, 
ending up at (".front with no marked cells. 

Zigging is used during the mopping stage just as during the marking stage. 

Remark 4.13 (More on alignment). One solution for the problem presented 
in Example 4.11 would be to zig also outside of the recovery interval during 
the mopping phase. However, this would open the door for the errors to 
influence the recovery interval in a sliding manner in yet another, but similar 
way. Alignment snaps the interval R always to center on the colony boundary, 
preventing a sliding contamination with islands. (Stains can still be created 
in the neighbor colony, but as we will see later in Lemma 5.6, they stay 
within E -\- (5 cells from the colony boundary.) j 

It is easy to check that the recovery procedure uses only a constant num- 
ber of sweeps, for a total number of steps 

Kr = 0{P). (4.4) 



5 Proof of the Main Theorem 

It is useful to spell out the kind of simulation that machine Mi performs. 

Definition 5.1. A computation history in the sense of Definition 1.3 is a 
(/3, V) -noisy trajectory^ if faults in it are confined to bursts of size < /3 sepa- 
rated by time intervals of size > V. 

A pair of mappings (99*,$*) in the sense of Definition 3.1 is a (/?,]/)- 
tolerant simulation of Turing machine M2 by Turing machine Mi if for every 
string X G S25 every {/3^ l/)-noisy trajectory rj of Mi whose initial configura- 
tion is (f^{x)^ the history $*(7y) is a trajectory of M2. j 

The proof of the main theorem will show, as a side result, that our sim- 
ulation is a (;5, l/)-tolerant simulation of M2 by Mi. We assume that the 
output of M2 is or 1 written in cell 0. It is time to define more precisely 
the concepts connected with recovery. 
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5.1 Annotated history 

Let us analyze the kind of histories that are possible with sparse bursts 
of faults. Recall the definition of (possibly centrally consistent) annotated 
configuration in Definition 4.7. 

Definition 5.2 (Annotated history). An annotated history is a sequence 
of annotated configurations if its sequence of underlying configurations is a 
(/3, l/)-trajectory, and it satisfies some additional requirements given below. 

If the head is in a free cell, in normal mode, then the time (and the 
configuration) will be called distress-free. If the annotated configuration at a 
certain time is centrally consistent, then we call that time centrally consistent. 
A time that is not distress-free and was preceded by a distress-free time will 
be called a distress event. 

Consider a time interval [t, t + u) starting with a distress event and ending 
with the head becoming free again. It is called a relief event of duration u 
if the only possible island that remains from the distress area is due to some 
burst that occurred at a time intersecting [t, t+u). Moreover, if such an island 
exists, then the sweep direction from before the distress event is preserved, 
except when the island is outside the extended base colony — then it will be 
reversed. 

The extent of a relief event is the maximum size interval covering the 
distress area during the distress. 

Recall the definition of the parameter Kji in (4.4). The additional re- 
quirements for annotated history are: 

(a) Islands are only created by noise. Stains and the distress area start out 
as islands. 

(b) Each distress event is followed immediately by a relief event, of duration 
< 3Kr and extent < 3E. 

(c) If a distress-free configuration has Sw > TfSt, then the base colony 
contains no stains from earlier work periods. 



Lemma 5.6 (Recovery) will be a crucial step towards the proof of the main 
theorem. Before spelling it out and proving it, we provide some preparatory 
lemmas. 
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5.2 Undisturbed recovery 

The idea of the proof of rehef from damage is the foUowing. If alarm is 
called and the recovery process is allowed to complete, then it carries out 
the needed correction, as guaranteed by the Correction Lemma 4.9. Most 
complications are due to the fact that the state after a burst is arbitrary. 

When the mode is normal then zigging will make sure that the effect 
is limited to near the island where the burst happened: for example, the 
direction of a sweep cannot be changed in the middle of the workspace, since 
then zigging would notice this and call alarm. 

But the mode after the burst can be the recovery mode, with arbitrary 
values for all fields. Moreover, a new burst may occur after an alarm, at an 
arbitrary stage of the recovery. 

In this section, we address the cases when two bad eflFects do not combine: 
either an alarm is called and completes without a new burst intervening, or 
a burst occurs at a distress-free time. Recall the definitions of the constants 
Kr in (4.4) and Z in (2.2). 

Lemma 5.3 (Undisturbed alarm). Suppose that in an annotated history, 
alarm sounds at a time when the front of the healthy configuration x is ^t 
a distance at most 2Z from the head^, and the distress area does not stretch 
more than total size 2Z. Suppose also that no burst occurs in the next Kr 
steps. Then the annotation of the history can he extended so that relief comes 
in fewer than Kr steps, while no more than 2E cells are added to the distress 
area before it disappears. 

Proof. Assume that the conditions of the lemma hold. Let x denote the 
position of the head at the moment when alarm is called. Let us follow the 
recovery procedure, to show how the relief is achieved. 

After the alarm, in the first two sweeps of the recovery procedure, interval 
[x — 7/5, X + 7/3) is created and then, an interval R is opened that extends the 
distress area. For the procedure to succeed, the condition of the Correction 
Lemma 4.9 must hold that in one half of R no more than 3/3 cells are empty. 
This is trivially true even when the alarm is called on the very first few steps 



^ The worst case occurs when the front is within Z — 4/3 cells from a colony boundary 
and the head while zigging visits the neighboring colony where, within the first 4/3 steps a 
burst occurs and puts the state into marking (right locating branch) . Alarm will be called 
closer to the front, and the distress area can grow by up to {Rec.Z — 4/3) + /3. 
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of the simulation (since we have assumed that the address fields of the base 
colony and its two neighbors are nonempty). 

Recall the notation R = [a, b) and R of the Correction Lemma 4.9. In its 
proof, we used rUa- to denote the multiplicity of the plurality sweep a within 
the interval [a + 3/3,6 — 3/3). Without loss of generality, assume that the 
direction of a is 1. We will have 



X.front < a + 1.3^; = b- 0.7E, 



and therefore x-front is not to the right of R. Indeed, the assumptions of the 
lemma along with definitions of Z^E^Rec.Z in (2.2), (3.1) and (4.2) imply 
that X is not further than 0.04E' <Q.1E from x- front. Now the claim follows 
from property 2(i)b of the recovery procedure. 

Furthermore, at least rria- right sweeping cells in R will be on the left of 
X. front. As a majority among not fewer than E — 3f3 cells, rricr ^ {E — 3/3)/2. 
This shows that x-front is not to the left of i?, hence R = R. It follows that 
the recovery procedure erases the marks in the distress area, and rewrites all 
island cells in i?, allowing to erase the distress area and the islands to get 
relief within Kr steps. D 

Lemma 5.4 (Burst). Assume that the history has been annotated up to a 
time when a bursty creating an island Jq; occurs at a distress-free time. Then 
the burst is followed by a relief event of duration < Kr + Z and extent < 3E. 

Proof We consider various situations after the burst. Recall that we called 
an interval R of length 2E aligned if in a healthy configuration satisfying the 
present one, its ends have addresses divisible by E (equivalently, if its end 
positions as absolute integers are divisible by E). 

Let X denote the healthy configuration that is part of the annotation at the 
time of the burst. Since the burst occurs at a distress-free time, the head is 
within Z from x- front when it happens. In what follows, we will sometimes 
refer to x- front of this moment as just the front. 

1. Assume first that the mode immediately after the burst is normal. 

Without loss of generality, assume that the sweep was to the right. We 
start at some position x that is either in island Jq or next to it. Now 
the head zigs backward and forward Z steps (see (2.2)), with respect to 
the sweep direction, between any two sequences of Z — 4/3 forward moving 
steps. In any of these, it may discover an incoordination and call alarm, 
in which case Lemma 5.3 becomes applicable. 
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1.1. Assume first that the burst does not change the sweep and address. 

In this case, the head wiU continue its forward sweep, with just possibly 
changed zigging. While hitting elements of the island Jq it may sense 
incoordination and call alarm. If this happens then Lemma 5.3 applies, 
since we are at most Z + 3/3 steps behind front, and at most 3/3 steps 
ahead it. 

Before the head manages to traverse Jq, it may hit another island causing 
an alarm. The point where this alarm can be called may be at most 
Z — 3/3 steps ahead of the front, so Lemma 5.3 applies again. 

In case of alarm, the recovery area will cover the island Jq? and if it was 
triggered by another island then that one, too. Any points of island Jq 
traversed during the progress and zigging can be erased from the island, 
and after a complete cycle of zigging occurs the untraversed parts of Jq 
may stay as an island. 

How can it happen that not the whole Jq =: [a^a + /3) is traversed? 
In this case, the next backward zig does not cross Jq, so it starts from 
> a + Z. To get there we need x-front > a + Z — {Z — 4/5) = a + 4/3 
when we start. 

1.2. Assume now that the burst changes Addr oi Sw. 

Lemma 4.6 says that unless c.Addr is in a certain interval of length 4/5, 
the pair {c.Addr^ c.Sw) pair determines uniquely the Addr value coor- 
dinated with it. If the burst changes Addr then therefore this will be 
noticed as soon as the head leaves the island and possibly this interval, 
causing an alarm, so Lemma 5.3 applies. 

Similarly, if Sw changes by more than 1 then it will be noticed, as soon 
as the head leaves the island or the area of size 4/5 mentioned. If it just 
changes by 1 then the head reverses direction, and the incoordination 
may not be immediately noticed when stepping off the island. But 
zigging will take us all the way across Jq and therefore if alarm does 
not sound, Jq can be erased (this can only happen if Jq is at the end of 
the colony where the sweep would have changed anyway). Indeed, just 
as above, we can see that the only possibility that the next backward 
zig does not cross Jq would be that the front is to the left of a + /3 by 
at least 4/3. But this is impossible, since as the original sweep is to the 
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right, the head was not right of the front when the island occurred. 

2. Suppose that the mode after the burst is Recovering. 

In the recovery rule, as defined in Section 4.2, the head moves around in 
an aligned interval R of size 2E. The marked area is extended in stage 
Marking, and shrunk in stages Mopping^. If the stage after the burst 
is Planning^, then alarm is called almost immediately (possibly passing 
through some island cells first), since we assumed a start from a distress- 
free configuration, in which by definition no non-island cells are marked. 
Then Lemma 5.3 applies. 

2.1. Suppose that the stage after the burst is Marking. 

By its design, the marking rule marks new cells while also using a rule 
similar to Zigzag (rf), but moving (and marking) at most i?ec.Z— 4/3 cells 
while moving in one direction. Alarm is only called when an alignment 
problem is found, or non-marked cells are found where marked ones are 
expected. Therefore alarm can only occur within the first 2Rec.Z steps 
after a burst. Indeed, zigging checks alignment with the cells marked 
earlier. If alignment inconsistency is not found then it will not be found 
later either. 

It follows that in case of new alarm. Lemma 5.3 applies, and the recovery 
reprocesses all cells marked after the burst. 

2.2. Assume that after the burst a mopping stage is entered. 

The mopping stages only erase marks, and apply c.Core ^ c.Rec.Core. 
Since we started in a distress-free configuration, we had c.Rec.Core = 
everywhere but in the islands. Marking will not change the c. Core value 
anywhere else. It follows that within at most as many cells as the total 
length of islands possibly encountered, there is either an alarm due to 
not seeing marks, or return to normal mode. From there on, the analysis 
of part 1 applies. 

D 



5.3 Disturbed recovery 

We would say that recovery is disturbed when a burst occurs during a recov- 
ery process started by an alarm. Since bursts are rare, the alarm in question 
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must have happened then without a burst, which could occur only on en- 
countering some island Ji. Since there was no recent burst (within the last 
V steps), this encounter could have occurred only during the transfer phase. 

Lemma 5.5 (Disturbed recovery). Assume that the history has been admis- 
sible up to a time when the head steps on an island Ji^ in a transfer sweep 
TfSt(5)^ 5g{ — l,l}orm the first zigging into the neighbor colony immedi- 
ately after this sweep. 

Then the annotation can be extended such that a relief event of duration 
< 3Kr and extent < 3E occurs. 

Proof. Without loss of generality, assume 5 = 1, that is the direction of the 
transferring sweep is to the right. Let x denote the healthy configuration 
that is part of the annotation at the time when an alarm occurs at some cell 

Xq. 

In what follows, we will sometimes refer to x- front of this moment as just 
the front. 

In the transferring phase all structural (that is c.Core) information in the 
non-island cells we pass is computable from the field Core of the state (see 
Lemma 4.6). Therefore if no alarm or burst occurs while the head is on Ji, 
then the part of Ji that was passed can be deleted from the island. 
If no burst occurs within the next 2Kr steps, then Lemma 5.3 is applicable. 
From now on, we assume that a burst occurs during this time, creating an 
island Jq- 

If the burst occurs while the head is on island Ji, then Lemma 5.4 is appli- 
cable. Assume therefore that alarm occurs at some time while the head is 
on Ji (or over a cell next to it), but a burst occurs only at some later time 
ti. Let D{t) denote the interval of marked cells at time t, created by the 
recovery process started by the alarm. 

L Suppose that new alarm will be called within 2Z steps after the burst at 
some cell x. 

Then zigging implies that we are also not removed beyond distance Z from 
D{ti) at the time of the alarm. 

After the burst, we are within distance 13 from D{ti). If the recovery before 
the burst did not determine Zi yet, then the size of D{ti) is at most 7/5. 
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Since after the burst we are burst- free for a while, Lemma 5.3 guarantees 
the rehef. Otherwise, recovery after the initial alarm has already defined 
cell Zi of the recovery procedure. Then, 

xo G [zi-0.3E,zi + 0.2E). (5.1) 

A new recovery area R' = z[ + [—E^ E) will be created. The alignment 
guarantees z[ = Zi^ Zi — E oi Zi -\- E. The direction 5 computed after the 
second alarm is necessarily the same as the one computed after the first 
one. 

Now, if alarm after the burst is called in the same interval (5.1) as the 
initial alarm then the same recovery interval will be opened again, hence 
z[ = Zi. li X < Zi — CSE", then z[ = Zi — E. Finally, if alarm after the 
burst is called to the right of Zi + 0.2£', then z[ = Zi + E. 

If z[ = Zi^ then all cells of D{ti) will be reprocessed, and the recovery 
succeeds. 

1.1. Assume z[ = Zi — E. 

If X-front < z[+0.3E then R' = R' . Then after the new recovery finishes, 
marked cells in interval D{ti) \ R' of length < E may still be there. 
However, the mode after the recovery is normal, and we have assumed 
that the sweep direction is to the right. Therefore, these marked cells 
will be reached, and alarm will be triggered. Indeed, even if the front 
is at the colony boundary, and Zi is the colony boundary (in which case 
the head is turning left), within Z — ^^ steps the zigging will start, and 
the head will pass over Zi, where marked cells may exist. If they exist 
they trigger alarm, and an undisturbed recovery, with a recovery interval 
equal to i?, will eliminate remaining marks.^ 

If x-front > ^1 + CSE" = ^i — 0.2£', then R! = (i?0o- O^ice the recovery 
over R' finishes, the head will be left on its right end, where alarm will 
be called, since marked cells will be found. Then, a new undisturbed 
recovery cleans the remaining marks and in the previous case. 

1.2. Consider the case z[ = Zi + E. 



^We allow the head to zig into the neighbor colony in order to definitely reach all 
remaining marked cells. 
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Now the new recovery interval does not contain the front 0.2E deep 
inside. Indeed, alarm at Xq was called on an island either when moving 
right, or while zigging into a right neighor colony (this zigging goes at 
most 4/3 deep). Therefore, the position where alarm is called for the first 
time can only be to the right of the front within a distance not exceeding 
Z-4/5 (see Fig. 5). 

<Z-3(3 

\ \ f^^ ^ I \ \ 

z,-E X' front x^ z, z,-^0.2E z, + E 



Figure 5: Point Xq where the alarm is called once the head encountered the 
island, is always to the left of Zi + 0.2£', therefore front(x) < Zi + 0.2E as 
well 

Since Xq G [zi — 0.3E^ z\ + 0.2£'), the front cannot be in \z'^ — O.SE", z'^ + 
CSE"), and the Correction Lemma 4.9 yields B! = {R^)^. 

Once the recovery completes, the head is put into Zi, where a new alarm 
will be called when the marked cells are discovered during zigging. The 
new recovery area after this alarm is R again, and the process eliminates 
the remaining parts of D{ti)^ leading to relief. 

2. Suppose that alarm will not be called within 2Z steps after the burst. 

2.1. Suppose that the burst brings the machine to normal mode. 

If Jo ^ ^(^i) then the proof of Lemma 5.4 is applicable. Otherwise, as 
zigging meets the marked cells in D{t) within 2Z steps, a new alarm will 
be called, and part 1 is applicable. 

If a burst occurs while the head is near the boundary of the recovery 
interval, then it may leave an island outside the recovery interval (within 
distance oi E -\- (5 from zi), provided that after the burst the recovery 
continues seamlessly where it was interrupted. 

2.2. Suppose that the stage after the burst is Marking. 

If the recovery process continues the old one seamlessly, then it termi- 
nates with success. 

Otherwise, since the marking stage employs zigging, alarm occurs within 
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2Rec.Z < 2Z steps. From then on, an analysis identical to the one in 
the proof of Lemma 5.4 shows that the cells marked after the burst will 
be contained in the recovery area created by the new alarm. To what 
happens after, the analysis of part 1 is applicable. 

2.3. Suppose that the stage after the burst is Planning^ or Mopping. 

Since these stages expect to walk over a recovery area, they must seam- 
lessly continue what went on before, except for changing the state and 
the content of some cells in an island — otherwise alarm occurs immedi- 
ately. 

If the burst occurs during Planning;^? ^^d it changes what is computed, 
then Planning2 will notice this and trigger alarm. Since this alarm occurs 
in the existing marked area i^(ti), the analysis of part 1 still applies. 

If the burst occurs during Planning2 or Mopping then it either triggers 
alarm, in which case the above analysis applies, or it allows the recovery 
process to end, with the lasting effect of the burst restricted just to the 
island Jq. 

Lemma 4.9 guarantees that whatever assignments c.Core ^ c.RecCore 
were made in the mopping stage, they are admissible; even if mopping 
will be interrupted by a burst (and then continued as mopping). 
To bound the duration of relief, we note that at the worst case in part 1, the 
first recovery initiated by the island can reach only up to mopping. After 
the burst, at most two other full recovery cycles occur with at most 2Z <^ E 
steps before them. Hence the total duration of the relief is < 3i^i^. 

D 

5.4 Finishing the proof 

The following lemma implies the main theorem. 

Lemma 5.6 (Recovery). Assume that machine Mi starts working on a tape 
configuration of the form (^^{x). Every (;5,y)-noisy trajectory of Mi can he 
annotated. 

Proof. Assume that the history has been annotated in an admissible way up 
to a certain time. First we show that in case a distress event occurs, the 
annotation can be extended to keep property 5.2 (b). Then using this, we 
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will show that in case of no distress, the annotation can also be extended in 
an admissible way while keeping the other properties. 

1. Consider property 5.2 (b). 

If a distress event occurs due to a burst then Lemma 5.4 applies. 
Assume now that a distress event occurs due to stepping onto an island 

Ji. 

Assume first that no burst occurs in the following SKr steps. Now, if no 
alarm sounds within 2Z steps, then zigging guarantees that the part of 
the island passed over can be replaced with a stain. (The only way not to 
pass some part is when the island is in a neighbor colony at distance ^ 4:f3 
from the boundary: zigging may reach just a part of it.) If alarm sounds, 
then Lemma 5.3 is applicable. 

If there will also be a burst within the following 3Kji steps, then there 
has not occurred any burst recently (within V steps). There could not 
have been any distress in the last sweep: indeed, any earlier island on 
which the head could have stepped would have been eliminated (at least 
its part in the path of the head) without or with alarm, as seen in the 
previous paragraph. But then the only way to step on an island is under 
the conditions of Lemma 5.5. 

2. Consider property 5.2 (a). 

Assume an admissible annotated history until a distress-free time t. We 
will show that by just keeping the islands constant, the annotation is ex- 
tendable in an admissible way to t + L In particular, there will still be a 
satisfying healthy configuration. 

Looking at Definition 4.2 of healthy configurations, most properties are 
obviously preserved in each step by just the form of the transition rule. 
The exceptions are the property which requires that the c. Drift track holds 
constant values in certain intervals at certain times, and the property which 
requires that c.Info and c. State tracks hold valid codewords of the code 
(99*, 99*) defined in Section 3.L 

So we are only concerned with the recomputation of the values of c.State^ 
c.Info^ c. Drift in the base colony, during the computation phase, and then 
the transfer of c. State during the transfer phase. (The value of c. Drift in 
the neighbor colonies is inherited from earlier, and its spreading from Drift 
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is watched over by the coordination requirement: a change would trigger 
alarm at zigging.) 

Recall the structure and the tasks of the computation phase given in 3.3. 
By properties of annotated configurations in Definition 4.7, in the base 
colony, besides a possible island, there is at most 1 more stain of size /3, 
and possibly more stains, all contained in a single interval of size E -\- (3. 
(The bound comes from the length of possible penetration of the head in 
a neighboring colony while faults could occur.) These last stains can be 
ignored, since our code is defined in such a way that it places a codeword 
of the (/5, 2) burst-error-correcting code (i;*,^;*) at a distance I.IE away 
from the colony boundaries. 

The recovery rules do not change the c.Hold^ c. State and Info tracks, and 
given that Sw < TfSt, they do not change c. Drift track either. Therefore, 
since there are at most 2 stains at distance I.IE from the boundaries, and 
our code is (/3, 2) burst-error-correcting, the result of decoding from the 
Info and State tracks during the computation phase will be the same as if 
the configurations had been stainless all along. 

Even if a fault causes the head to step into a neighbor colony that can be 
empty and set Sw = TfSw(±l), after at most 2Z steps, the head will step 
back inside the colony it came from, and it will call alarm there. Since 
E ^ Z^ the distress area will contain entirely this segment of cells. 

Any distress event will directly affect at most one of the three repeti- 
tions of the computation phase: the configuration is centrally consistent 
during the others. Consequently, the correct values will will be stored 
in track c.IIold[i]^ i G {1,2,3} for all but one i. If the sweep of the 
field majority computation during the encoding stage of the computa- 
tion phase is distress-free, then every cell will receive the correct value 
maj(c.i7oW[l ... 3]). But even if distress occurs in this sweep, relief guar- 
antees that all cells but the ones in the island of the burst that caused the 
distress will hold the correct value. 

The same argument proves the property that the newly computed c. State 
will be correctly transferred to the extended base colony in the transfer 
phase. 

Consider property 5.2(c). 

From the above argument it is clear that the only possible stain remaining 
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in the base colony is the one created by a burst in the current work period. 
On the other hand, we can add stains and islands to neighbor colonies. 

Let us see what is the farthest distance to which we can intrude into a 
neighbor colony and leave islands. With zigging, the head can penetrate 
at most 4:/3 cells into the neighbor colony, where it can find an island 
causing alarm. A burst ocurring anywhere in the recovery interval created 
by this alarm may leave a stain anywhere within distance oi E + f3 from 
the colony boundary (where the recovery interval is centered). 

D 

Lemma 5.7 (Simulation). Under the conditions of Lemma 5.6, via some 
simulation function $* (to he defined in the proof of the present lemma), 
the movement of the base colony corresponds to the head movement of the 
simulated machine M2 (scaled up by a factor of Q). Whenever the sweep in 
the free cells of the base colony is not one of switching to a new work period, 
the array of c. State values there decodes into the state of M2, and the array 
of c.Info values decodes into the current tape cell symbol of M2. 

Proof Lemma 5.6 gives us an admissible history. At all distress-free times, it 
also defines uniquely a base colony. For distressed times, let the base colony 
be equal to that of the last distress-free time. Once a base colony is given 
for each configuration, the simulation function is also uniquely defined: we 
decode the simulated cell content of each cell of M2 from the corresponding 
colony, and the simulated state from the c. State array of the base colony. 
Part 2 of the proof of Lemma 5.6 shows that the decoding indeed defines a 
trajectory of M2. D 

Proof of Theorem 1.13. The statement follows essentially from Lemma 5.7, 
adding only the following. Let / be a projection from the alphabet of Mi 
to the alphabet of M2, defined by f{s) = s. c.Info. Consider now the cell 
at the origin of the tape. Then, relation (L3) holds due the step Ic of the 
computation procedure in section 3.3. 

What are all the lower bounds on Q7 Since the program of the ma- 
chine M2 must fit in a colony, Q is lowerbounded by p2. Definitions (2.2) 
and (3.1) show E = 0{/3). We needed to be able to define the code (0*, 0*) in 
Section 3.1 fitting into the part of the colony away by I.IE from the bound- 
ary. These requirements are satisfied with Q depending linearly on log IS2I, 
log |r2| and E. 
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The computation phase lasts 0{Q) steps, where we also used the require- 
ment in Definition 1.11. The transferring of the State into the neighboring 
colony will need Q sweeps, that is 0{Q^) steps. Therefore the constant V 
bounding the time overhead of machine Mi is V = 0{Q^). D 

6 Conclusions and future work 

In this paper we have shown that for any Turing machine there is one that can 
simulate it while while correcting occasional violations of its own transition 
function. The procedure recovering the simulation structure is based on an 
organization in which any group of cells aflFected by the faults is surrounded 
by cells that conserve some valid traces of the computation. 

We hope to use this construction, similarly to [4], as a building block 
in a more complex construction of a Turing machine that can resist faults 
occurring independently with small probability. 

To the best of our knowledge, this is the first construction of a reliable se- 
quential machine. An interesting question is if the Turing machines are the 
simplest machines that can perform universal computation under isolated 
bursts of noise. It seems that simpler models, like the counter machines 
of [12], are insufficient, but there are some interesting questions open con- 
cerning the nature of their insufficiency. 
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